Tamil Morphological Analyzer Using Support Vector Machines
نویسندگان
چکیده
Morphology is the process of analyzing the internal structure of words. Grammatical features and properties are used for this analysis. Like other Dravidian languages, Tamil is a highly agglutinative language with a rich morphology. Most of the current morphological analyzers for Tamil mainly use segmentation to deconstruct the word to generate all possible candidates and then either grammar rules or tagging mismatch is used during post processing to get the best candidate. This paper presents a morphological engine for Tamil that uses grammar rules and an annotated corpus to get all possible candidates. A support vector machines classifier is employed to determine the most probable morphological deconstruction for a given word. Lexical labels, respective frequency scores, average length and suffixes are used as features. The accuracy of our system is 98.73 % and a F-measure of .943, which is more than the same reported by other similar research.
منابع مشابه
AMRITA@FIRE-2014: Morpheme Extraction for Tamil using Machine Learning
This article presents the working methodology of supervised Morpheme Extraction Task for Tamil language in Morpheme Extraction Task (MET) Task of FIRE-2014. In this attempt, Tamil Morphemes are extracted based on supervised machine learning algorithm, Support vector machines.
متن کاملOnline Handwritten Character Recognition for Devanagari and Tamil Scripts Using Support Vector Machines
متن کامل
A Sequence Labeling Approach to Morphological Analyzer for Tamil Language
Morphological analysis is the basic process for any Natural Language Processing task. Morphology is the study of internal structure of the word. Morphological analysis retrieves the grammatical features and properties of a morphologically inflected word. Capturing the agglutinative structure of Tamil words by an automatic system is a challenging job. Generally rule based approaches are used for...
متن کاملA Comparative Study of Extreme Learning Machines and Support Vector Machines in Prediction of Sediment Transport in Open Channels
The limiting velocity in open channels to prevent long-term sedimentation is predicted in this paper using a powerful soft computing technique known as Extreme Learning Machines (ELM). The ELM is a single Layer Feed-forward Neural Network (SLFNN) with a high level of training speed. The dimensionless parameter of limiting velocity which is known as the densimetric Froude number (Fr) is predicte...
متن کاملA Novel Approach to Morphological Analysis for Tamil Language
This paper presents the morphological analysis for complex agglutinative Tamil language using machine learning approach. Morphological analysis is concerned with retrieving the structure, syntactic rules, morphological properties and the meaning of a morphologically complex word. The morphological structure of an agglutinative language is unique and capturing its complexity in a machine analyza...
متن کامل